COVID19 Data Analysis Using Python

In this project we are going to use COVID19 dataset we have consisting of data-related cumulative number of confirmed, recovered, and deaths cases. we are going to prepare this dataset to answer these questions: How does Global Spread of the virus look like? How intensive the spread of the virus has been in the countries? Does covid19 national lockdowns and self-isolations in different countries have actually impact on COVID19 transmission? we are going to use Plotly module, which is a great visualization tool in python, in order to plot some insightful and intuitive graphs to answer the questions.

Loading libaries and dataset

In [1]:
pip install plotly==4.10.0
Requirement already satisfied: plotly==4.10.0 in /Users/Yifanlin/anaconda3/lib/python3.7/site-packages (4.10.0)
Requirement already satisfied: retrying>=1.3.3 in /Users/Yifanlin/anaconda3/lib/python3.7/site-packages (from plotly==4.10.0) (1.3.3)
Requirement already satisfied: six in /Users/Yifanlin/anaconda3/lib/python3.7/site-packages (from plotly==4.10.0) (1.12.0)
Note: you may need to restart the kernel to use updated packages.
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import time
In [6]:
# Data urls
dataset_url ="https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv"
df = pd.read_csv(dataset_url)
df.head()
Out[6]:
Date Country Confirmed Recovered Deaths
0 2020-01-22 Afghanistan 0 0 0
1 2020-01-23 Afghanistan 0 0 0
2 2020-01-24 Afghanistan 0 0 0
3 2020-01-25 Afghanistan 0 0 0
4 2020-01-26 Afghanistan 0 0 0
In [7]:
df.tail()
Out[7]:
Date Country Confirmed Recovered Deaths
48875 2020-10-03 Zimbabwe 7885 6327 228
48876 2020-10-04 Zimbabwe 7888 6359 228
48877 2020-10-05 Zimbabwe 7898 6424 228
48878 2020-10-06 Zimbabwe 7915 6440 229
48879 2020-10-07 Zimbabwe 7919 6441 229
In [8]:
df.shape
Out[8]:
(48880, 5)

Visualizing Global Spread of COVID-19 from first day of the Pandemic

Using Choropleth map to Visualize Global Spread of COVID-19 from the first day of the pandemic

In [9]:
df = df[df.Confirmed > 0]
In [10]:
df.head()
Out[10]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0 0
34 2020-02-25 Afghanistan 1 0 0
35 2020-02-26 Afghanistan 1 0 0
36 2020-02-27 Afghanistan 1 0 0
37 2020-02-28 Afghanistan 1 0 0
In [12]:
df[df.Country == "Australia"].tail()
Out[12]:
Date Country Confirmed Recovered Deaths
2335 2020-10-03 Australia 27135 24864 894
2336 2020-10-04 Australia 27148 24888 894
2337 2020-10-05 Australia 27173 24890 895
2338 2020-10-06 Australia 27181 24915 897
2339 2020-10-07 Australia 27206 24937 897
In [13]:
fig = px.choropleth(df, locations = "Country", locationmode="country names", color = "Confirmed", 
                    animation_frame = "Date")
fig.update_layout(title_text = "Novel coronavirus (COVID19) cases worldwide")
fig.show()
In [14]:
fig = px.choropleth(df, locations = "Country", locationmode="country names", color = "Deaths", 
                    animation_frame = "Date")
fig.update_layout(title_text = "Deaths of coronavirus (COVID19) worldwide")
fig.show()

National Lockdown Impacts COVID 19 transmission in Australia

COVID19 spread before and after lockdown in Australia

Visualizing the impact of the national lockdown in Australia on the spread of the virus.

In [15]:
aus_lockdown_start_date = '2020-03-23'
aus_lockdown_a_month_later = '2020-04-26'
vic_lockdown_stage3_start_date = '2020-07-08'
vic_lockdown_stage4_end_date ='2020-09-13'
In [16]:
df_aus = df[df.Country == 'Australia']
df_aus.head()
Out[16]:
Date Country Confirmed Recovered Deaths
2084 2020-01-26 Australia 4 0 0
2085 2020-01-27 Australia 5 0 0
2086 2020-01-28 Australia 5 0 0
2087 2020-01-29 Australia 6 0 0
2088 2020-01-30 Australia 9 2 0
In [17]:
df_aus['Infection Rate'] = df_aus.Confirmed.diff()
df_aus.head()
/Users/Yifanlin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Out[17]:
Date Country Confirmed Recovered Deaths Infection Rate
2084 2020-01-26 Australia 4 0 0 NaN
2085 2020-01-27 Australia 5 0 0 1.0
2086 2020-01-28 Australia 5 0 0 0.0
2087 2020-01-29 Australia 6 0 0 1.0
2088 2020-01-30 Australia 9 2 0 3.0
In [30]:
fig = px.line(df_aus,x='Date', y=['Confirmed'])
fig.show()
In [19]:
df_aus[['Date','Confirmed']]
fig = px.line(df_aus, x='Date', y='Deaths', title = 'The number of death due to COVID19 in Australia')
fig.show()

Infection Rate Before and After Lockdown in Australia

Visualizing the impact of the national lockdown in Australia on the infection rate.

In [20]:
fig = px.line(df_aus, x= 'Date', y = 'Infection Rate', title = 'Before and after lockdown in Australia')
fig.add_shape(
    dict(
    type = "line",
    x0 = aus_lockdown_start_date,
    y0 = 0,
    x1 = aus_lockdown_start_date,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='red',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= aus_lockdown_start_date,
    y= df_aus['Infection Rate'].max(),
    text = 'starting date of the national lockdown'
    )
)
fig.add_shape(
    dict(
    type = "line",
    x0 = aus_lockdown_a_month_later,
    y0 = 0,
    x1 = aus_lockdown_a_month_later,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='orange',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= aus_lockdown_a_month_later,
    y= 500,
    text = 'ending date of the national lockdown'
    )
)
fig.add_shape(
    dict(
    type = "line",
    x0 = vic_lockdown_stage3_start_date ,
    y0 = 0,
    x1 = vic_lockdown_stage3_start_date,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='red',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= vic_lockdown_stage3_start_date,
    y= df_aus['Infection Rate'].max(),
    text = 'starting date of the vic stage 3 lockdown'
    )
)
fig.add_shape(
    dict(
    type = "line",
    x0 = vic_lockdown_stage4_end_date,
    y0 = 0,
    x1 = vic_lockdown_stage4_end_date,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='orange',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= vic_lockdown_stage4_end_date,
    y= 680,
    text = 'ending date of the vic stage 4 lockdown'
    )
)

Deaths Rate Before and After Lockdown in Australia

Visualizing the impact of the national lockdown in Australia on the death rate.

In [21]:
df_aus.head()
Out[21]:
Date Country Confirmed Recovered Deaths Infection Rate
2084 2020-01-26 Australia 4 0 0 NaN
2085 2020-01-27 Australia 5 0 0 1.0
2086 2020-01-28 Australia 5 0 0 0.0
2087 2020-01-29 Australia 6 0 0 1.0
2088 2020-01-30 Australia 9 2 0 3.0
In [22]:
df_aus['Deaths Rate'] = df_aus.Deaths.diff()
df.head()
/Users/Yifanlin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Out[22]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0 0
34 2020-02-25 Afghanistan 1 0 0
35 2020-02-26 Afghanistan 1 0 0
36 2020-02-27 Afghanistan 1 0 0
37 2020-02-28 Afghanistan 1 0 0

Normalise the Infection rate and Deaths rate

In [23]:
df_aus['Infection Rate'] = df_aus['Infection Rate']/df_aus['Infection Rate'].max()
df_aus['Deaths Rate'] = df_aus['Deaths Rate']/df_aus['Deaths Rate'].max()
fig = px.line(df_aus,x='Date', y=['Infection Rate','Deaths Rate'], title = 'Before and After lockdown in Australia')
fig.add_shape(
    dict(
    type = "line",
    x0 = aus_lockdown_start_date,
    y0 = 0,
    x1 = aus_lockdown_start_date,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='Green',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= aus_lockdown_start_date,
    y= df_aus['Infection Rate'].max(),
    text = 'starting date of the national lockdown'
    )
)
fig.add_shape(
    dict(
    type = "line",
    x0 = aus_lockdown_a_month_later,
    y0 = 0,
    x1 = aus_lockdown_a_month_later,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='orange',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= aus_lockdown_a_month_later,
    y= 0.95,
    text = 'ending date of the national lockdown'
    )
)
fig.add_shape(
    dict(
    type = "line",
    x0 = vic_lockdown_stage3_start_date ,
    y0 = 0,
    x1 = vic_lockdown_stage3_start_date,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='Green',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= vic_lockdown_stage3_start_date,
    y= df_aus['Infection Rate'].max(),
    text = 'starting date of the vic stage 3 lockdown'
    )
)
fig.add_shape(
    dict(
    type = "line",
    x0 = vic_lockdown_stage4_end_date,
    y0 = 0,
    x1 = vic_lockdown_stage4_end_date,
    y1 = df_aus['Infection Rate'].max(),
    line = dict(color='orange',width = 2)
    )
)

fig.add_annotation(
    dict(
    x= vic_lockdown_stage4_end_date,
    y= 0.95,
    text = 'ending date of the vic stage 4 lockdown'
    )
)
/Users/Yifanlin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/Users/Yifanlin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [ ]: